You work at a startup that sells food products. You need to investigate user behavior for the company's app. First study the sales funnel. Find out how users reach the purchase stage. How many users actually make it to this stage? How many get stuck at previous stages? Which stages in particular? Then look at the results of an A/A/B test. (Read on for more information about A/A/B testing.) The designers would like to change the fonts for the entire app, but the managers are afraid the users might find the new design intimidating. They decide to make a decision based on the results of an A/A/B test. The users are split into three groups: two control groups get the old fonts and one test group gets the new ones. Find out which set of fonts produces better results. Creating two A groups has certain advantages. We can make it a principle that we will only be confident in the accuracy of our testing when the two control groups are similar. If there are significant differences between the A groups, this can help us uncover factors that may be distorting the results. Comparing control groups also tells us how much time and data we'll need when running further tests. You'll be using the same dataset for general analytics and for A/A/B analysis. In real projects, experiments are constantly being conducted. Analysts study the quality of an app using general data, without paying attention to whether users are participating in experiments.
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
from scipy import stats as st
import numpy as np
import math as mth
import random
from statsmodels.stats.proportion import proportions_ztest
data_logs=pd.read_csv("/Users/edeng/Downloads/logs_exp_us.csv", sep="\t")
data_logs.info()
data_logs.columns=['event_name', 'user_id', 'event_timestamp', 'exp_id']
#checking for duplicates
print()
print('The share of duplicates is: {:.2f}%'.format((data_logs[data_logs.duplicated()].count()[1]/data_logs.count()[1])*100))
print()
#dropping duplicates
data_logs.drop_duplicates(inplace=True)
data_logs = data_logs.reset_index(drop=True)
#checking NaN values
data_logs.isnull().values.any()
#creating a datetime column from the timestamp column
data_logs['datetime']=data_logs['event_timestamp'].apply(lambda t: dt.datetime.fromtimestamp(t))
#adding date column
data_logs['date']=data_logs['datetime'].apply(lambda t: t.strftime('%Y-%m-%d'))
data_logs.head()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 244126 entries, 0 to 244125 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 EventName 244126 non-null object 1 DeviceIDHash 244126 non-null int64 2 EventTimestamp 244126 non-null int64 3 ExpId 244126 non-null int64 dtypes: int64(3), object(1) memory usage: 7.5+ MB The share of duplicates is: 0.17%
| event_name | user_id | event_timestamp | exp_id | datetime | date | |
|---|---|---|---|---|---|---|
| 0 | MainScreenAppear | 4575588528974610257 | 1564029816 | 246 | 2019-07-25 07:43:36 | 2019-07-25 |
| 1 | MainScreenAppear | 7416695313311560658 | 1564053102 | 246 | 2019-07-25 14:11:42 | 2019-07-25 |
| 2 | PaymentScreenSuccessful | 3518123091307005509 | 1564054127 | 248 | 2019-07-25 14:28:47 | 2019-07-25 |
| 3 | CartScreenAppear | 3518123091307005509 | 1564054127 | 248 | 2019-07-25 14:28:47 | 2019-07-25 |
| 4 | PaymentScreenSuccessful | 6217807653094995999 | 1564055322 | 248 | 2019-07-25 14:48:42 | 2019-07-25 |
#checking for users on both group
users=data_logs.groupby(['user_id','exp_id'])['event_name'].count().reset_index()
onBothGroups=users[users['user_id'].duplicated()]
onBothGroups
| user_id | exp_id | event_name |
|---|
I have loaded the logs, changed the columns to small letters, found out that 0.17% of the data is duplicates, and droped them. Also created two other columns for datetime and date. Finally checked if users are on both groups, but there weren't.
data_logs['event_timestamp'].count()
243713
data_logs['user_id'].nunique()
7551
data_logs.groupby('user_id')['event_name'].count().mean()
32.27559263673685
print('The logs data period starts at:', data_logs['datetime'].min())
print('The logs data period ends at:',data_logs['datetime'].max())
The logs data period starts at: 2019-07-25 07:43:36 The logs data period ends at: 2019-08-08 00:15:17
fig = px.histogram(data_logs, x="datetime", nbins=100, title="Number of events per date")
fig.show()
We can see from the histogram above that the date started to be complete from August 1st, as we see a cyclical pattern where data is complete, and before the 1st of August the count if much less then the minimum count after that date. So I am removing the data before August 1st, which then the data will represent a week: From 1st of August To 7th of August.
data_logs=data_logs[data_logs['datetime']>='2019-08-01'].reset_index()
data_logs.head()
| index | event_name | user_id | event_timestamp | exp_id | datetime | date | |
|---|---|---|---|---|---|---|---|
| 0 | 1989 | MainScreenAppear | 7701922487875823903 | 1564606857 | 247 | 2019-08-01 00:00:57 | 2019-08-01 |
| 1 | 1990 | MainScreenAppear | 2539077412200498909 | 1564606905 | 247 | 2019-08-01 00:01:45 | 2019-08-01 |
| 2 | 1991 | OffersScreenAppear | 3286987355161301427 | 1564606941 | 248 | 2019-08-01 00:02:21 | 2019-08-01 |
| 3 | 1992 | OffersScreenAppear | 3187166762535343300 | 1564606943 | 247 | 2019-08-01 00:02:23 | 2019-08-01 |
| 4 | 1993 | MainScreenAppear | 1118952406011435924 | 1564607005 | 248 | 2019-08-01 00:03:25 | 2019-08-01 |
lost_events=data_logs['event_name'].count()
print('The precentage of event lost is:{:.2f}%'.format(lost_events/243713))
print('The number of lost event is:', 243713-lost_events)
The precentage of event lost is:0.99% The number of lost event is: 1989
We have lost 2826 events (243713-240887=2826) which is 0.99% of the original data.
lost_users=data_logs['user_id'].nunique()
print('The precentage of users lost is:{:.2f}%'.format(lost_users/7551))
print('The number of lost event is:', 7551-lost_users)
The precentage of users lost is:1.00% The number of lost event is: 13
We have lost 17 users which is 1% of the data
data_logs.groupby('exp_id')['user_id'].nunique()
exp_id 246 2484 247 2517 248 2537 Name: user_id, dtype: int64
events_freq=data_logs.groupby('event_name')['event_timestamp'].nunique().sort_values(ascending=False).reset_index()
events_freq
| event_name | event_timestamp | |
|---|---|---|
| 0 | MainScreenAppear | 104172 |
| 1 | OffersScreenAppear | 44180 |
| 2 | CartScreenAppear | 40248 |
| 3 | PaymentScreenSuccessful | 32619 |
| 4 | Tutorial | 1010 |
MainScreenAppear has the highest frequency, followed by OffersScreenAppear, CartScreenAppear, PaymentScreenSuccessful and lastly Tutorial.
users_freq=data_logs.groupby('event_name')['user_id'].nunique().sort_values(ascending=False).reset_index()
users_number=data_logs['user_id'].nunique()
users_freq['at_least_1_%']=users_freq['user_id']/users_number*100
users_freq
| event_name | user_id | at_least_1_% | |
|---|---|---|---|
| 0 | MainScreenAppear | 7423 | 98.474396 |
| 1 | OffersScreenAppear | 4597 | 60.984346 |
| 2 | CartScreenAppear | 3736 | 49.562218 |
| 3 | PaymentScreenSuccessful | 3540 | 46.962059 |
| 4 | Tutorial | 843 | 11.183338 |
Per the above the MainScreenAppear event was performed at least once by 98.47% of the total users. Tutorial was performed at least once by 11.15% of users.
The way I see it the sequence is: MainScreenAppear--->OffersScreenAppear--->CartScreenAppear--->PaymentScreenSuccessful. Tutorial is not part of the funnel as it has too little people who clicked it, maybe because the tutorial is optional.
#removing the Toturial from the funnel
event_funnel=users_freq.drop(users_freq.tail(1).index)
fig = go.Figure(go.Funnel(x=event_funnel['user_id'], y=event_funnel['event_name'], textinfo = "value+percent previous"))
fig.update_layout(title='share of users that proceed from each stage to the next')
fig.show()
We can see that 62% of users from MainScreenAppear proceed to OffersScreenAppear of Those 81% proceed to CartScreenAppear nad finally 95% of those make a purchase.
We lose most users in the MainScreenAppear page as only 62% procced from there to the next stage.
fig = go.Figure(go.Funnel(x=event_funnel['user_id'], y=event_funnel['event_name'], textinfo="value+percent initial"))
fig.update_layout(title='Share of users make the entire journey from their first event to payment')
fig.show()
Per the above 48% of people reached the payment screen.
data_logs.groupby('exp_id')['user_id'].nunique()
exp_id 246 2484 247 2517 248 2537 Name: user_id, dtype: int64
H0: The proportions of the two groups are equal.
H1:The proportions of the two groups are different.
def AAtest(data_logs):
group_246=data_logs[data_logs['exp_id']==246]
group_247=data_logs[data_logs['exp_id']==247]
event_246=group_246.groupby('event_name')['user_id'].nunique().sort_values(ascending=False).drop('Tutorial')
event_247=group_247.groupby('event_name')['user_id'].nunique().sort_values(ascending=False).drop('Tutorial')
alpha=0.05
for i in range(len(event_246)):
print('The number of users who perform the event {} from group 246 is: {}'.format(event_246.index[i], event_246.values[i]))
print('The share of users that performed the event {} in group 246 is: {:.2f}%'.format(event_246.index[i], event_246.values[i]/group_246.groupby('user_id')['user_id'].nunique().sum()*100))
print()
print('The number of users who perform the event {} from group 247 is: {}'.format(event_247.index[i], event_247.values[i]))
print('The share of users that performed the event {} in group 247 is: {:.2f}%'.format(event_247.index[i], event_247.values[i]/group_247.groupby('user_id')['user_id'].nunique().sum()*100))
print()
successes = np.array([event_246.values[i], event_247.values[i]])
samples = np.array([group_246.groupby('user_id')['user_id'].nunique().sum(), group_247.groupby('user_id')['user_id'].nunique().sum()])
stat, p_value = proportions_ztest(count=successes, nobs=samples, alternative='two-sided')
print('z_stat: %0.3f, p_value: %0.3f' % (stat, p_value))
if p_value > alpha:
print("Fail to reject the null hypothesis")
else:
print("Reject the null hypothesis - suggest the alternative hypothesis is true")
print()
print('------------------------------------------------')
AAtest(data_logs)
The number of users who perform the event MainScreenAppear from group 246 is: 2450 The share of users that performed the event MainScreenAppear in group 246 is: 98.63% The number of users who perform the event MainScreenAppear from group 247 is: 2479 The share of users that performed the event MainScreenAppear in group 247 is: 98.49% z_stat: 0.418, p_value: 0.676 Fail to reject the null hypothesis ------------------------------------------------ The number of users who perform the event OffersScreenAppear from group 246 is: 1542 The share of users that performed the event OffersScreenAppear in group 246 is: 62.08% The number of users who perform the event OffersScreenAppear from group 247 is: 1524 The share of users that performed the event OffersScreenAppear in group 247 is: 60.55% z_stat: 1.110, p_value: 0.267 Fail to reject the null hypothesis ------------------------------------------------ The number of users who perform the event CartScreenAppear from group 246 is: 1266 The share of users that performed the event CartScreenAppear in group 246 is: 50.97% The number of users who perform the event CartScreenAppear from group 247 is: 1239 The share of users that performed the event CartScreenAppear in group 247 is: 49.23% z_stat: 1.231, p_value: 0.218 Fail to reject the null hypothesis ------------------------------------------------ The number of users who perform the event PaymentScreenSuccessful from group 246 is: 1200 The share of users that performed the event PaymentScreenSuccessful in group 246 is: 48.31% The number of users who perform the event PaymentScreenSuccessful from group 247 is: 1158 The share of users that performed the event PaymentScreenSuccessful in group 247 is: 46.01% z_stat: 1.631, p_value: 0.103 Fail to reject the null hypothesis ------------------------------------------------
We can see that in the A/A test there was no statistical difference between the groups, this is great as we can say now that the test was done correctly.
def ABtest1(data_logs):
group_246=data_logs[data_logs['exp_id']==246]
group_248=data_logs[data_logs['exp_id']==248]
event_246=group_246.groupby('event_name')['user_id'].nunique().sort_values(ascending=False).drop('Tutorial')
event_248=group_248.groupby('event_name')['user_id'].nunique().sort_values(ascending=False).drop('Tutorial')
alpha=0.05
for i in range(len(event_246)):
print('The number of users who perform the event {} from group 246 is: {}'.format(event_246.index[i], event_246.values[i]))
print('The share of users that performed the event {} in group 246 is: {:.2f}%'.format(event_246.index[i], event_246.values[i]/group_246.groupby('user_id')['user_id'].nunique().sum()*100))
print()
print('The number of users who perform the event {} from group 248 is: {}'.format(event_248.index[i], event_248.values[i]))
print('The share of users that performed the event {} in group 248 is: {:.2f}%'.format(event_248.index[i], event_248.values[i]/group_248.groupby('user_id')['user_id'].nunique().sum()*100))
print()
successes = np.array([event_246.values[i], event_248.values[i]])
samples = np.array([group_246.groupby('user_id')['user_id'].nunique().sum(), group_248.groupby('user_id')['user_id'].nunique().sum()])
stat, p_value = proportions_ztest(count=successes, nobs=samples, alternative='two-sided')
print('z_stat: %0.3f, p_value: %0.3f' % (stat, p_value))
if p_value > alpha:
print("Fail to reject the null hypothesis")
else:
print("Reject the null hypothesis - suggest the alternative hypothesis is true")
print()
print('------------------------------------------------')
ABtest1(data_logs)
The number of users who perform the event MainScreenAppear from group 246 is: 2450 The share of users that performed the event MainScreenAppear in group 246 is: 98.63% The number of users who perform the event MainScreenAppear from group 248 is: 2494 The share of users that performed the event MainScreenAppear in group 248 is: 98.31% z_stat: 0.940, p_value: 0.347 Fail to reject the null hypothesis ------------------------------------------------ The number of users who perform the event OffersScreenAppear from group 246 is: 1542 The share of users that performed the event OffersScreenAppear in group 246 is: 62.08% The number of users who perform the event OffersScreenAppear from group 248 is: 1531 The share of users that performed the event OffersScreenAppear in group 248 is: 60.35% z_stat: 1.258, p_value: 0.208 Fail to reject the null hypothesis ------------------------------------------------ The number of users who perform the event CartScreenAppear from group 246 is: 1266 The share of users that performed the event CartScreenAppear in group 246 is: 50.97% The number of users who perform the event CartScreenAppear from group 248 is: 1231 The share of users that performed the event CartScreenAppear in group 248 is: 48.52% z_stat: 1.732, p_value: 0.083 Fail to reject the null hypothesis ------------------------------------------------ The number of users who perform the event PaymentScreenSuccessful from group 246 is: 1200 The share of users that performed the event PaymentScreenSuccessful in group 246 is: 48.31% The number of users who perform the event PaymentScreenSuccessful from group 248 is: 1182 The share of users that performed the event PaymentScreenSuccessful in group 248 is: 46.59% z_stat: 1.219, p_value: 0.223 Fail to reject the null hypothesis ------------------------------------------------
The was no statistical difference in the A/B test between 246 and 248, the font did no have a statistical effect
def ABtest2(data_logs):
group_247=data_logs[data_logs['exp_id']==247]
group_248=data_logs[data_logs['exp_id']==248]
event_247=group_247.groupby('event_name')['user_id'].nunique().sort_values(ascending=False).drop('Tutorial')
event_248=group_248.groupby('event_name')['user_id'].nunique().sort_values(ascending=False).drop('Tutorial')
alpha=0.05
for i in range(len(event_247)):
print('The number of users who perform the event {} from group 247 is: {}'.format(event_247.index[i], event_247.values[i]))
print('The share of users that performed the event {} in group 247 is: {:.2f}%'.format(event_247.index[i], event_247.values[i]/group_247.groupby('user_id')['user_id'].nunique().sum()*100))
print()
print('The number of users who perform the event {} from group 248 is: {}'.format(event_248.index[i], event_248.values[i]))
print('The share of users that performed the event {} in group 248 is: {:.2f}%'.format(event_248.index[i], event_248.values[i]/group_248.groupby('user_id')['user_id'].nunique().sum()*100))
print()
successes = np.array([event_247.values[i], event_248.values[i]])
samples = np.array([group_247.groupby('user_id')['user_id'].nunique().sum(), group_248.groupby('user_id')['user_id'].nunique().sum()])
stat, p_value = proportions_ztest(count=successes, nobs=samples, alternative='two-sided')
print('z_stat: %0.3f, p_value: %0.3f' % (stat, p_value))
if p_value > alpha:
print("Fail to reject the null hypothesis")
else:
print("Reject the null hypothesis - suggest the alternative hypothesis is true")
print()
print('------------------------------------------------')
ABtest2(data_logs)
The number of users who perform the event MainScreenAppear from group 247 is: 2479 The share of users that performed the event MainScreenAppear in group 247 is: 98.49% The number of users who perform the event MainScreenAppear from group 248 is: 2494 The share of users that performed the event MainScreenAppear in group 248 is: 98.31% z_stat: 0.524, p_value: 0.600 Fail to reject the null hypothesis ------------------------------------------------ The number of users who perform the event OffersScreenAppear from group 247 is: 1524 The share of users that performed the event OffersScreenAppear in group 247 is: 60.55% The number of users who perform the event OffersScreenAppear from group 248 is: 1531 The share of users that performed the event OffersScreenAppear in group 248 is: 60.35% z_stat: 0.146, p_value: 0.884 Fail to reject the null hypothesis ------------------------------------------------ The number of users who perform the event CartScreenAppear from group 247 is: 1239 The share of users that performed the event CartScreenAppear in group 247 is: 49.23% The number of users who perform the event CartScreenAppear from group 248 is: 1231 The share of users that performed the event CartScreenAppear in group 248 is: 48.52% z_stat: 0.500, p_value: 0.617 Fail to reject the null hypothesis ------------------------------------------------ The number of users who perform the event PaymentScreenSuccessful from group 247 is: 1158 The share of users that performed the event PaymentScreenSuccessful in group 247 is: 46.01% The number of users who perform the event PaymentScreenSuccessful from group 248 is: 1182 The share of users that performed the event PaymentScreenSuccessful in group 248 is: 46.59% z_stat: -0.416, p_value: 0.678 Fail to reject the null hypothesis ------------------------------------------------
The was no statistical difference in the A/B test between 247 and 248, the font did no have a statistical effect
def ABtestcombined(data_logs):
group_combined=data_logs[(data_logs['exp_id']==247) | (data_logs['exp_id']==246)]
group_248=data_logs[data_logs['exp_id']==248]
event_group_combined=group_combined.groupby('event_name')['user_id'].nunique().sort_values(ascending=False).drop('Tutorial')
event_248=group_248.groupby('event_name')['user_id'].nunique().sort_values(ascending=False).drop('Tutorial')
alpha=0.05
for i in range(len(event_group_combined)):
print('The number of users who perform the event {} from group_combined is: {}'.format(event_group_combined.index[i], event_group_combined.values[i]))
print('The share of users that performed the event {} in group_combined is: {:.2f}%'.format(event_group_combined.index[i], event_group_combined.values[i]/group_combined.groupby('user_id')['user_id'].nunique().sum()*100))
print()
print('The number of users who perform the event {} from group 248 is: {}'.format(event_248.index[i], event_248.values[i]))
print('The share of users that performed the event {} in group 248 is: {:.2f}%'.format(event_248.index[i], event_248.values[i]/group_248.groupby('user_id')['user_id'].nunique().sum()*100))
print()
successes = np.array([event_group_combined.values[i], event_248.values[i]])
samples = np.array([group_combined.groupby('user_id')['user_id'].nunique().sum(), group_248.groupby('user_id')['user_id'].nunique().sum()])
stat, p_value = proportions_ztest(count=successes, nobs=samples, alternative='two-sided')
print('z_stat: %0.3f, p_value: %0.3f' % (stat, p_value))
if p_value > alpha:
print("Fail to reject the null hypothesis")
else:
print("Reject the null hypothesis - suggest the alternative hypothesis is true")
print()
print('------------------------------------------------')
ABtestcombined(data_logs)
The number of users who perform the event MainScreenAppear from group_combined is: 4929 The share of users that performed the event MainScreenAppear in group_combined is: 98.56% The number of users who perform the event MainScreenAppear from group 248 is: 2494 The share of users that performed the event MainScreenAppear in group 248 is: 98.31% z_stat: 0.854, p_value: 0.393 Fail to reject the null hypothesis ------------------------------------------------ The number of users who perform the event OffersScreenAppear from group_combined is: 3066 The share of users that performed the event OffersScreenAppear in group_combined is: 61.31% The number of users who perform the event OffersScreenAppear from group 248 is: 1531 The share of users that performed the event OffersScreenAppear in group 248 is: 60.35% z_stat: 0.808, p_value: 0.419 Fail to reject the null hypothesis ------------------------------------------------ The number of users who perform the event CartScreenAppear from group_combined is: 2505 The share of users that performed the event CartScreenAppear in group_combined is: 50.09% The number of users who perform the event CartScreenAppear from group 248 is: 1231 The share of users that performed the event CartScreenAppear in group 248 is: 48.52% z_stat: 1.287, p_value: 0.198 Fail to reject the null hypothesis ------------------------------------------------ The number of users who perform the event PaymentScreenSuccessful from group_combined is: 2358 The share of users that performed the event PaymentScreenSuccessful in group_combined is: 47.15% The number of users who perform the event PaymentScreenSuccessful from group 248 is: 1182 The share of users that performed the event PaymentScreenSuccessful in group 248 is: 46.59% z_stat: 0.460, p_value: 0.645 Fail to reject the null hypothesis ------------------------------------------------
The was no statistical difference in the A/B test between 247+246 and 248, the font did no have a statistical effect. In conclusion the font in general did not have a statistical effect.
I have initially used 0.05 as the significance level, but even if changing it to 0.005 still the we fail to reject the null hypothesis.
In conclusion, I have loaded the logs and prepered the data by removing duplicates and checking NaN values, also changing the dtypes and creating a timedate and date column.
I also noticed that the data is incomplete before the 1st of August, so I have drop that and left with a week of data.
I found out that 0.17% of the data is duplicates, and droped them and checked if users are on both groups, but there weren't.
Also found that the correct sequence of events is MainScreenAppear--->OffersScreenAppear--->CartScreenAppear--->PaymentScreenSuccessful.
After analyzing the results of the A/A/B test I found that there is not statistical difference between the two groups which means that the fonts did not make a difference to user engagment. Therefore it might be more cost effective to keep the fonts, or if we really wanted to change them, at least we know there will be no effect on the users.